Regionalized Policy Representation for Reinforcement Learning in POMDPs
نویسندگان
چکیده
Many decision-making problems can be formulated in the framework of a partially observable Markov decision process (POMDP) [5]. The optimality of decisions relies on the accuracy of the POMDP model as well as the policy found for the model. In many applications the model is unknown and learned empirically based on experience, and building a model is just as difficult as finding the associated policy. Since the ultimate goal of decision making is the optimal policy, it is advantageous to learn an optimal policy directly from experience, without an intervening stage of model learning.
منابع مشابه
The Infinite Regionalized Policy Representation-0.1cm
We introduce the infinite regionalized policy presentation (iRPR), as a nonparametric policy for reinforcement learning in partially observable Markov decision processes (POMDPs). The iRPR assumes an unbounded set of decision states a priori, and infers the number of states to represent the policy given the experiences. We propose algorithms for learning the number of decision states while main...
متن کاملDeep Reinforcement Learning with POMDPs
Recent work has shown that Deep Q-Networks (DQNs) are capable of learning human-level control policies on a variety of different Atari 2600 games [1]. Other work has looked at treating the Atari problem as a partially observable Markov decision process (POMDP) by adding imperfect state information through image flickering [2]. However, these approaches leverage a convolutional network structure...
متن کاملData-Efficient Reinforcement Learning in Continuous-State POMDPs
We present a data-efficient reinforcement learning algorithm resistant to observation noise. Our method extends the highly data-efficient PILCO algorithm (Deisenroth & Rasmussen, 2011) into partially observed Markov decision processes (POMDPs) by considering the filtering process during policy evaluation. PILCO conducts policy search, evaluating each policy by first predicting an analytic distr...
متن کاملCoordinated Multi-Agent Reinforcement Learning in Networked Distributed POMDPs
In many multi-agent applications such as distributed sensor nets, a network of agents act collaboratively under uncertainty and local interactions. Networked Distributed POMDP (ND-POMDP) provides a framework to model such cooperative multi-agent decision making. Existing work on ND-POMDPs has focused on offline techniques that require accurate models, which are usually costly to obtain in pract...
متن کاملProposal for an Algorithm to Improve a Rational Policy in POMDPs
Reinforcement learning is a kind of machine learning. Partially Observable Markov Decision Process (POMDP) is a representative class of non-Markovian environments in reinforcemnet learning. We know the Rational Policy Making algorithm (RPM) to learn a deterministic rational policy in POMDPs. Though RPM can learn a policy very quickly, it needs numerous trials to improve the policy. Furthermore ...
متن کامل